Technical Report ICMA-92-175 July, 1992
Multidimensional Computer Simulation
of Stirling Cycle Engines
by
C. Hall and T. Porsching
Institute for Computational Mathematics and Applications
Department of Mathematics and Statistics
University of Pittsburgh, Pittsburgh, PA 15260
Final Technical Report
for
NASA-Grant NAG3-1097
MULTIDIMENSIONAL COMPUTER SIMULATION
OF STIRLING CYCLE ENGINES
1. Introduction
This report summarizes the activity performed under NASA-Grant NAG3-1097 during 1991.
During that period, work centered on the following tasks: (1) To investigate more efficient solvers for
ALGAE, (2) To modify the plotting package for ALGAE, and (3) To validate ALGAE by simulating
oscillating flow problems similar to those studied by Kurzweg and Ibrahim et. al. [8] - [10].
2* New Solution Methods for ALGAE
This section presents a description of the new direct and iterative solution methods that supple-
ment the original frontal method in the computer program ALGAE. It also discusses a new iterative
method for the solution of the discrete energy equations.
2.1. Bordered Banded Solver
The bordered banded solver implemented in ALGAE makes use of customized CRAY Y/MP rou-
tines. A genera] reference for direct solver strategies may be found in, for example, [1]. The ALGAE
algorithm is basically a block Gaussian elimination strategy which exploits the structure of the dual
variable system and the Cray library routines in LAPACK, [2]. Also, unlike the frontal solver in
ALGAE, the bordered banded solver makes no use of disk storage for problems as large as 10,000 flow
cells. Timing comparisons are included which indicate a speedup factor of 5 to 6 for moderate size
problems and a factor of 4 to 5 for larger problems when compared with an optimized frontal solver.
These factors are considerably larger (50 to 200) when the bordered banded solver times are compared
to those of the origiml (single precision) ALGAE frontal solver.
2.1.1. Reordering of the equations and dual variables
- 2 -
The system that is solved at each time step in ALGAE is composed of equations having two
separate nonzero structures. Normally an equation in the dual variable system has a zero-nonzero
structure which is similar to that resulting from the 13 point finite difference stencil approximating the
biharmonic operator. That is, there are at most 13 nonzero entries in the equation, and the semi-
bandwidth is generally 2N for an N by N flow cell problem. The second type of equation arises when
there are obstructions in the flow region or if the flow is pressure driven. In such cases there are equa-
tions in which there are generally many more nonzero entries and these may occur outside the band of
width 4N. In terms of network terminology, cycles or countries that are composed of 4 or less links
give rise to equations that are of the first type, while cycles that circumscribe obstacles or connect pres-
sure specified boundary segments may give rise to equations of the second type.
The first modification to the solution strategy in ALGAE is a reordering of the unknown dual
variables and equations so that the more complex type 2 equations occur last in the system. The result-
ing system is then bordered banded and can be written as
B X = K,
where B is a block matrix of the form
_ B n B n
B 2 \ B 2 2
The matrix B u is banded and very sparse, the dimension of B 22 is usually quite small, and the matrices
in the border (B 12 , B 21/ and B 22 ) are generally dense. Note that B 22 may be void, i.e., the matrix B on
occasion may be banded. It is assumed that X =(X Ir X 2 ) and K = (K u K 2 ) are partitioned consistently
with the partitioning of B .
2.1.2. Storing nonzero entries
To eliminate the use of disk I/O during the solution of the dual variable system, the positions
(/, /) of possible nonzero entries in the dual variable system are calculated once and for all. These posi-
tions are the same for each time step of the transient. Given these positions, only the nonzero entries of
the system are calculated.
- 3 -
2.1.3. Bordered banded solution algorithm
The bordered banded algorithm used is contained in the software package BORMAT available
from the Pittsburgh Supercomputer Center [3]. Note that the BLAS, LAPACK and LINPACK routines
used in BORMAT are available on the CRAY in SCILIB. (The LAPACK option is used in BORMAT).
The bordered banded system is factored and the resulting factored system is then solved. The
process is as follows:
(1) B u is factored and the L-U factors L\\U n are stored.
(2) The system
L\\U ]] Z = -B 12 ,
is solved and B 12 is overwritten with the result.
(3) Gaussian elimination of the two block rowed system requires the calculation of
B ' 22 = B 22 - B2\(L n U u ) 1 B\2 = B 22 + B 2]Z
using the Z in step (2). The result is stored in B 2 2 *
(4) The new (2,2) block determined in step (3)
B ' 27 ~ L 22 U 22 f
is factored and the result stored in the same space.
(5) The subsystem
^11^11 Y \ = Ki .
is solved.
(6) K 2 is replaced by K 2 - B 2 \ Y 1 .
(7) The subsystem
(L 22 L/ 22 ) x 2 = k 2 .
is solved.
- 4 -
(8) The solution X t is calculated as
X t = Y, + B 21 X 2 .
2.1.4. Performance Data
Four data files were used to initially test the modified ALGAE solution options. The first prob-
lem involved a preheater section geometry, the second dealt with flow through an aircraft compart-
ment containing obstacles, and the third and fourth were driven cavity geometries. The following are
the statistics for these problems. NOB denotes the dimension of B 22 -
Problem
Flow cells
Dual variables
NOB
Semi-bandwidth
Preheater
1
45
Aircraft compartment
1174
1065
3
60
Small cavity
2500
2401
0
98
Large cavity
10000
9801
0
198
Three solvers were tested; the original frontal solver, an optimized version of the frontal solver,
and finally the bordered banded solver described above. The following are timings for a CRAY Y/MP
using one processor. (This enhances the performance of the bordered banded solver and has very little
effect on that of the frontal solver.)
Each problem was run for ten steps. The following table lists the average time in seconds to solve
the dual variable system for each of the four problems. Due to the excessive cost, the large cavity prob-
lem timings for the original frontal scheme (designated *) are for a single step. Also included are the
ratios of the solve times for the optimized frontal solver to those of the bordered banded solver.
Average Time to Solve One System
Problem
Original frontal
Optimized frontal
Bordered banded
Ratio
Preheater
10.52
0.51
0.05
9.3
Aircraft compartment
16.94
0.82
0.09
8.3
Small cavity
38.46
2.04
0.32
6.4
Large cavity
191.08*
14.22
3.52
4.0
The next table contains the CPU charges on the Y/MP for the ten steps of each of the runs. This
includes setup, solution times and I/O for the entire run. The CPU time (designated **) for the original
frontal solver was determined by multiplying that for a single step by ten. As above, the ratios of these
- 5 -
times for the optimized frontal and bordered banded solvers are also given. Note that there is a one
time additional cost for the setup of the bordered-banded solver. These ratios would increase if this
cost was amortized over more steps.
Total CPU Time Charged
Problem
Original frontal
Optimized frontal
Bordered banded
Ratio
Preheater
108.29
7.49
2.99
2.5
Aircraft compartment
171.68
10.52
3.45
3.0
Small cavity
388.26
24.09
7.38
3.3
Large cavity
1 969.5**
154.45
53.13
2.9
In addition to the above data, note that the wall clock times for the optimized frontal method
were consistently 2 to 3 times greater than those for the bordered banded solver. Presumably this is a
reflection of the need for disk I/O in the former. Of course, the wall clock time depends on the com-
puting environment and the level of usage of the machine at any given time.
2.2. Iterative Solvers
The iterative solvers that have been incorporated into the ALGAE program are based on a pack-
age of preconditioned conjugate gradient methods developed by the Mathematical Software Research
Group at Cray Research, Inc.. A discussion of the package and details of its usage are given in [41 and
151 .
The NAMELIST/ START/ variable ISOLV selects the type of solver to be used in ALGAE as fol-
lows:
ISOLV =
0
1
2
3
optimized frontal method (default)
bordered-banded direct method
pure iterative method
hybrid method
Note that in addition to the pure iterative solver, a hybrid solver is included that incorporates selected
calls to the bordered-banded direct solver. The intention is to combine the computational efficiency of
an iterative solver with the robustness of a direct solver. The strategy controlling the use of these
solvers is discussed in section 1 .2.3.
- 6 -
2 . 2 . 1 . Data Structure
Let A x = b denote the dual variable system. The sparse column format is used to represent the
coefficient matrix A. In this format the row indices of the (potentially) nonzero entries of the ;-th
column are stored in ascending order. The corresponding values are also stored. There is also a
pointer array containing pointers to the locations of the row index and value of the first nonzero entry
in each of the columns.
2.2.2. Dual Variable System Generation
The dual variable coefficient matrix A is a product A = C T QC, where C is a fundamental matrix
for the network defined by the finite difference mesh, and Q is the coefficient matrix of the discrete
momentum equations. Thus, if A = J, C = tc^ m 1 and Q = [^ v l/ then
&ij = fij vj ■
H-v
Now the data structures of ALGAE are such that the matrix C is defined by the link numbers. There-
fore, A is generated by determining for each p, v pair the i , j pairs for which a.j receives a contribution
of the form c^cf^ v c v j, and then accumulating the sums.
The driver for the iterative solver uses an array to map the contributions to the a,j onto their
proper locations in the array containing the matrix values. This avoids conditional testing and
enhances the vectorization process.
2 . 2 . 3 . Program Control
For the pure iterative solver, the designated method is used with (or without) user selected
preconditioning at each time step. If a fatal error is detected during the course of an iteration, or if the
solution has not satisfied the convergence criterion in the user designated maximum number of itera-
tions, a message is written and the computations are terminated.
For the hybrid solver, the solution is computed using the bordered-banded direct solver at the
first time step and the preconditioning toggle is turned on. For each subsequent time step, an iterative
solution is attempted. An incomplete LU factorization is computed to precondition the system if the
- 7-
toggle is on, and the toggle is then turned off. Otherwise, the iterative method uses the most recently
computed incomplete LU factors as the preconditioner. The iterative solution is accepted provided
that an internal convergence criterion is satisfied and provided that IL4x* - bIL ^ 10^* / lbc*IL, where x* is
the iterative solution. Otherwise, the system is resolved using the bordered-banded direct solver and
the preconditioning toggle is turned on.
2,2.4. Performance Data
The first problem considered is the preheater problem. This is a modified version of the sample
problem given in Section 6, Volume 2 of [6]. There are 750 flow cells and 700 dual variables. The tim-
ings in the following table are based on the average solution time over thirty consecutive time steps
using a step size of .5 seconds. The original frontal method required 10.52 sec/solve, while the optim-
ized frontal method took .51 sec/solve. The column headed RATIO gives the ratio of the optimal fron-
tal solve time to that of the iterative solve time, and the last column gives the maximum pressure drop
(psi) around the cycles after the final time step. This quantity would be zero for an exact solution, and
so it may be used as a measure of the accuracy of the iterative solution.
Method
ISOLV
Sec /Solve
RATIO
Max. Loop Ap
CGS
2
.15
iia
3
.17
SB
GMR
2
.14
3.6
3
.19
2.7
The second set of sample timings corresponds to a large Poiseuille-type problem with heat added
to the central portion of the channel. There are 10000 flow cells and 9900 dual variables. The sample
times of the following table are based on the average solution time over 10 consecutive time steps
using a step size of 1.0. The original and optimized frontal methods required 193.5 and 15.3 sec/solve.
The conventions used in describing the previous table again apply.
Method
ISOLV
Sec/Solve
RATIO
Max. Loop Ap
CGS
2
3.42
4.5
.12x10^
3
4.34
3.5
.14xl0 -g
GMR
2
3.04
5.0
.48x1 0 -6
3
4.83
3.2
.14xl0~ g
- 8 -
23. Energy Equation
The original point Causs-Seidel method used in ALGAE to solve the discrete energy equations
has been replaced by the point Jacobi method. For model problems the asymptotic rate of convergence
of the Jacobi method is not as large as that of the Gauss-Seidel method. (See, for example, (7|.) How-
ever, it is amenable to vectorization, whereas the Gauss-Seidel method is not.
Sample timings were made on the two problems used to test the iterative solvers. The following
table shows that for these problems the new method resulted in an average speed-up factor (RATIO)
that is slightly greater than three.
Problem
No. Time Steps
Avg. Solve Times (Sec.)
RATIO
Gauss-Seidel
Jacobi
Preheater
30
.044
3.4
Heated Channel
10
.328
.108
3.04
3. Modification of Plotting Package for ALGAE
The original plotting package for ALGAE assumed that all walls were adiabatic. This was also
the assumption made in the fluid modeling by ALGAE, and was reasonable for thermal transfer in
reactor components. The ability to allow for conduction from solid walls was added to the code
ALGAE some years ago, but the plotting package was not changed to adequately display temperature
distributions near solid walls. As such, some thermal contours, while completely accurate one flow
cell from the wall, would terminate orthogonal to the wall. The plotting package has been updated
this year and now displays the correct contour levels up to and including the walls.
4. Oscillating Flow in a Regenerator Component
As part of the validation of ALGAE for Stirling Engine simulations, we consider a model of a
parallel plate regenerator similar to that studied by Ibrahim et. al. [8], 19]. Figure 1 shows a schematic
of the geometry and lists the assumed physical properties of the working gas (Helium) and wire 1 .
Also shown are the boundary conditions and mesh spacings used by ALGAE. Note that the angular
frequency and amplitude of the oscillating pressure drop forcing function are 412.5 sec -1 and .0349 psi
1 These were suggested by M. Ibrahim in a private communication, dated Dec 11, 1991.
- g .
p(t)
T =436.50 °F
Free slip adiabatic wall
R i = radius of channel = .2 1 8667 E-03 ft
R 2 = radius of wire = .41666 E-04 ft
L = length of channel = .3 E-02 ft
Ay =. 13975 E-04, Ay 0 = -57545 E-05, Ay 3 =.38363 E-05, Ay 4 = .10417 E-04
Ax = .17328 E-04 ft
Helium Properties:
Molecular weight : 4.002
Specific Heat: 1 .24 BTU/lbm °F
Thermal conductivity: 3.53 E-05 BTU/sec ft °F
Viscosity: 0.1 85 E-041bm/ft sec
Density: 0.93767 lbm/ft3
Steel Properties:
Specific Heat: 0.1 1 1 7 BTU/lbm °F
Thermal conductivity: 0.21 5 E-02 BTU/sec ft °F
Density: 51 4.28 Ibm/ft 5
P(t)= 2178.073 + 0.0349 Cos ( s / 2 + 412.5t )
Figure 1. Parallel plate regenerator.
respectively. This corresponds to a Valensi number, Va, of 1 and maximum Reynolds number, Re^a*,
of 262, and facilitates comparison of the present results with those of [9|, Figures 1 and 2, where these
quantities were 1 and 200. Recall that Va and Re^^ are defined as follows:
,, t0D|, 2 I'maxDfc
1/® ' . / R^max —
4v V
where to is the frequency, D h the hydraulic diameter, v the kinematic viscosity, and « max the maximum
over one half cycle of the average axial velocity.
- 10 -
Unlike the one dimensional numerical methods reported in (81 and (9| ( ALGAE solves the full
(unsteady) two dimensional Navier-Stokes and heat convection-conduction equations. Thus, radial
convection, conduction, and momentum effects are directly accounted for.
Figures 2 and 3 show axial Velocity and Temperature profiles as a function of the radial distance
measured from the centerline of the channel. The 12 curves shown on each figure correspond to incre-
ments of 30' in the crank angle as indicated by the accompanying key. These plots were made after a
start up transient of about 1.25 sec. They should be compared with Figures 1 and 2 of [9| which give
analogous plots of these quantities for the numerical model used in that work.
As shown in Figure 2, the time dependent behavior of the channel centerline axial velocity is
cyclic with an amplitude of about 18 ft/sec. This should be compared with the analogous result of (91,
Figure 1, where the amplitude is =13 ft/sec. The temperature variation of the Helium given by Figure
3 is approximately 49 F about a temperature of 406 F, whereas that of Figure 2 in (9) is -31 F about
408 °F. On the other hand the temperature variation in the wire shown in Figure 3 is about 2 F, while
that of Figure 2 in [9| is =14 F. Since the main mechanism for heat transfer in the Helium is convec-
tion, the 49 F variation in the gas would presumably be reduced if Re^ were reduced from 262 to
200. This will probably also reduce the variation in the wire. The two dimensional calculations do not
yield as rapid a response of the temperature at the gas-wire interface as those of [9]. This may be due
to a poor resolution of the boundary layer because of the coarseness of the spatial mesh near the inter-
face.
Velocity (ft/sec)
- 11 -
Symbol
KEY
Crank Angle (deg)
210, 30
180, 0
240, 60
150. 330
270, 90
120, 300
<£ Value (ft/sec)
17.9, -17.7
17.21. -16.94
13.9, -13.74
12.13. -11.69
6.25. -6.09
3.18, -3.21
Figure 2. Axial velocity profiles
Radial Distance (ft)
KEY
Crank Angle (deg)
210 , 30
180 , 0
240 , 60
150 , 330
270 , 90
120 , 300
<£ Value (deg F)
400.51, 413.09
411.31. 402.33
385.83, 428.5
420.40. 393.16
382.28. 431.29
428.34, 385.12
Figure 3. Temperature profiles
- 13 -
References
1. 1. S. Duff, A. M. Erisman, and J. K. Reid, Direct Methods for Sparse Matrices , Clarendon Press,
Oxford University Press, 1986.
2. UNICOS Math and Scientific Library Reference Manual SR-2081 6.0, Cray Research, 1991.
3. PSCDOC:BORMAT.DOC, Pittsburgh Supercomputing Center, May, 1991.
4. M. Heroux, P. Vu and C. Yang, "A Parallel Preconditioned Conjugate Gradient Package for Solv-
ing Sparse Linear Systems on a Cray Y-MP," Cray Research, Inc., preprint.
5. UNICOS Math and Scientific Library Reference Manual SR-2081 6.0, Vol 3, Cray Research, Inc.,
Mendota Heights, MN, 1991, pp 227-243.
6. R. S. Dougall, C. A. Hall and T. A. Porsching, "DUVAL: A Computer Program for the Numerical
Solution of Two-Dimensional, Two-Phase Flow Problems, Vols. 1-3,", Report NP-2099, Electric
Power Research Institute, Palo Alto, CA, 1982.
7. L. A. Hageman and D. M. Young, Applied Iterative Methods , Academic Press, New York, 1981.
8. M. B, Ibrahim, R. C. Tew and J. E. Dudenhoefer, "Two-dimensional Numerical Simulation of a
Stirling Engine Heat Exchanger", Proc. 24th IECEC, Paper 899536, Washington DC, 1989.
9. M. B. Ibrahim, R. C. Tew and J. E. Dudenhoefer, "Further Two-dimensional Code Development
for Stirling Space Engine Heat Components", Proc. 25th IECEC, Reno, NV, 1990, Vol. 6, pp 329-
335.
10. U. H. Kurzweg, "Enhanced Heat Conduction in Oscillating Viscous Flows Within Parallel-Plate
Channels", /. Fluid Mech. f 156,291-300, 1985.